Apparatus and method for focal length adjustment and depth map determination

ABSTRACT

A method for obtaining a depth map of a scene includes capturing a plurality of scene images of the scene, calculating a disparity of corresponding pixels in the scene images by optimizing a global energy function based on dynamic planning in a plurality of directions, and obtaining the depth map based on the disparity. The depth map is used to control a mobile platform.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a continuation application of application Ser. No. 15/701,041,filed on Sep. 11, 2017, which is a continuation application ofInternational Application No. PCT/CN2015/074336, filed on Mar. 16, 2015,the entire contents of both of which are incorporated herein byreference.

FIELD

The disclosed embodiments relate generally to digital imaging and moreparticularly, but not exclusively, to apparatus and methods foradjusting a focal length automatically and/or determining a depth mapfor an image.

BACKGROUND

Stereoscopic imaging, a technique whereby multiple imaging devices areused to form a three dimensional image through stereopsis, is becomingincreasingly common in many fields. Stereoscopic imaging is particularlyuseful in robotics, where it is often desirable to gatherthree-dimensional information about an operating environment of amachine. Stereoscopic imaging simulates the binocular visions of humaneyes and applies the principle of stereopsis to achieve depthperception. This technique can be reproduced by artificial imagingdevices by viewing a given object of interest using multiple imagingdevices from slightly different vantage points. Differences betweenvarying views of the object of interest convey depth information about aposition of the object, thereby enabling three-dimensional imaging ofthe object.

During the shooting of videos or television plays, tracking focal lengthis a difficult and professional task. Main current focal length trackingdevices require experienced focal length tracking operators to adjustfocal length of the cameras based on the monitor screens and theshooting site situations in real time.

For certain imaging applications, manual focal length adjustment iscumbersome and may be impractical where the imaging device is operatedremotely. Accordingly, it is desirable that a focal length adjustmentsystem be able to automatically adjust the focal length to track amoving object of interest.

SUMMARY

In accordance with a first aspect disclosed herein, there is set forth amethod for automatic focal length adjustment, comprising:

-   -   determining a distance between an object of interest and an        imaging mechanism; and    -   automatically adjusting a focal length of the imaging mechanism        according to the determining the distance.

In an exemplary embodiment of the disclosed methods, the determiningcomprises determining the distance between the object of interest in ascene and the imaging mechanism.

In an exemplary embodiment of the disclosed methods, the determiningcomprises imaging the scene with first and second imaging devicescontained in the imaging mechanism.

In an exemplary embodiment of the disclosed methods, the determiningfurther comprises:

-   -   obtaining a depth map of the scene;    -   selecting the object of interest in the scene; and    -   calculating the distance of the object of interest according to        the depth map of the scene.

In an exemplary embodiment of the disclosed methods, the obtainingcomprises calculating a disparity of scene images from the first andsecond imaging devices.

In an exemplary embodiment of the disclosed methods, the calculating thedisparity comprises optimizing a global energy function.

In an exemplary embodiment of the disclosed methods, the optimizing theglobal energy function comprises summing a disparity energy function anda scaled smoothing term.

In an exemplary embodiment of the disclosed methods, the disparityenergy function is represented by a Birchfield-Tomasi term.

In an exemplary embodiment of the disclosed methods, theBirchfield-Tomasi term is defined by accumulating the minimum disparityof coordinates for pixels in a first image of the scene captured by thefirst imaging device and a second image of the scene captured by thesecond imaging device.

An exemplary embodiment of the disclosed methods further comprises, forall neighbors of a pixel, accumulating scaled trigger functions of adisparity between two neighboring pixels to obtain the smoothing term.

In an exemplary embodiment of the disclosed methods, the accumulatingcomprises, for all neighbors of a pixel, accumulating scaled triggerfunctions of a disparity between two neighboring pixels from fourdomains.

In an exemplary embodiment of the disclosed methods, the smoothing termis obtained by accumulating scaled trigger functions of disparity forall neighbors of each pixel.

An exemplary embodiment of the disclosed methods further comprises:

-   -   aggregating data terms in a plurality of directions to obtain an        energy function for each of the directions; and    -   accumulating the energy functions in the directions to obtain        the energy function.

In an exemplary embodiment of the disclosed methods, the aggregatingcomprises obtaining energy functions in a predetermined number ofdirections.

In an exemplary embodiment of the disclosed methods, the aggregatingcomprises obtaining energy functions in four or eight directions.

In an exemplary embodiment of the disclosed methods, the aggregatingdata terms comprises obtaining energy function in a direction by summinga corresponding smoothing term and a dynamic planning in the direction.

In an exemplary embodiment of the disclosed methods, the summing thecorresponding smoothing term and the dynamic planning in the directioncomprises presenting the dynamic planning in the direction with arecurrence based on the energy functions of its neighbors in thisdirection.

In an exemplary embodiment of the disclosed methods, the directioncomprises a horizontal direction.

In an exemplary embodiment of the disclosed methods, the aggregatingdata terms in the horizontal direction comprises calculating the energyby a recurrence based on the energy functions of its neighbors in thehorizontal direction.

An exemplary embodiment of the disclosed methods further comprisesobtaining the best depth.

In an exemplary embodiment of the disclosed methods, the obtaining thebest depth comprises seeking the disparity value that minimizes thesummation of energy in a plurality of directions.

In an exemplary embodiment of the disclosed methods, the obtaining thebest depth comprises seeking the disparity value based on an energyfunction in one direction.

An exemplary embodiment of the disclosed methods further comprisesreducing noise by doing at least one of matching scene images from thefirst and second imaging devices and identifying respective uniquefeatures of the scene images while setting the disparity as -1.

An exemplary embodiment of the disclosed methods further comprisescompensating an error based on the factors at least selected from agroup consisting of the distance between a central line of the twoimaging devices, an actual distance of two adjacent pixels, the focallength of the two imaging devices and the depth between the object ofinterest and the first and second imaging devices.

An exemplary embodiment of the disclosed methods further comprisesoptimizing the depth map by using a non-partial optimizing equation.

An exemplary embodiment of the disclosed methods further comprisesobtaining a Jacobi iteration of the non-partial optimizing equation byusing a recurrence filtering.

In an exemplary embodiment of the disclosed methods, the selecting theobject of interest in the scene comprises receiving outside instructionsto select the object of interest.

In an exemplary embodiment of the disclosed methods, the receiving theinstructions comprises identifying the object of interest selected oneither of scene images from the first or the second imaging device.

In an exemplary embodiment of the disclosed methods, the identifying theobject of interest selected comprises sensing a frame on either of thescene images framing in the object of interest or sensing a click on theobject of interest on either of the scene images.

In an exemplary embodiment of the disclosed methods, the receiving theoutside instructions comprises receiving vocal instructions, optionallya pre-set name of the object of interest, to determine the object ofinterest.

In an exemplary embodiment of the disclosed methods, the selecting theobject of interest in the scene comprises judging under at least onepre-set rule and automatically determining the object of interest basedon the judging.

In an exemplary embodiment of the disclosed methods, the judging underthe at least one pre-set rule comprises judging if the object isapproaching or within a certain distance of the imaging mechanism.

In an exemplary embodiment of the disclosed methods, the automaticallyadjusting the focus comprises automatically adjusting the focal lengthof the imaging mechanism in real time with tracking learning detectionbased on gray level information of the object of interest.

In accordance with another aspect disclosed herein, there is set forth astereoscopic imaging system configured to perform automatic focal lengthadjustment in accordance with any one of the above methods.

In accordance with another aspect disclosed herein, there is set forth afocal length adjustment apparatus, comprising:

-   -   a distance assembly for determining a distance between an object        of interest in a scene and an imaging mechanism for imaging the        scene; and    -   a focal length assembly for automatically adjusting a focal        length of the imaging mechanism according to the determined        distance.

In an exemplary embodiment of the disclosed apparatus, the distanceassembly is configured to determine the distance between the object ofinterest in a scene and the imaging mechanism.

In an exemplary embodiment of the disclosed apparatus, the imagingmechanism comprises first and second imaging devices imaging the sceneto obtain first and second scene images.

In an exemplary embodiment of the disclosed apparatus, either of thefirst and second imaging devices is a camera or a sensor.

In an exemplary embodiment of the disclosed apparatus, the first andsecond imaging devices are selected from a group consisting of lasercameras, infrared cameras, ultrasound cameras and Time-of-Flightcameras.

In an exemplary embodiment of the disclosed apparatus, the first andsecond imaging devices are Red-Green-Blue (RGB) cameras.

In an exemplary embodiment of the disclosed apparatus, the distanceassembly comprises:

-   -   a depth estimation mechanism for obtaining a depth map of the        scene;    -   an object determination mechanism for determining the object of        interest in the scene; and    -   a calculating mechanism for calculating distance of the object        of interest according to the depth map of the scene.

In an exemplary embodiment of the disclosed apparatus, the depth map isobtained based on a disparity of the first and the second scene images.

In an exemplary embodiment of the disclosed apparatus, the depthestimation mechanism optimizes a global energy function.

In an exemplary embodiment of the disclosed apparatus, the global energyfunction is defined as a sum of a disparity energy function and a scaledsmoothing term.

In an exemplary embodiment of the disclosed apparatus, the disparityenergy function comprises a Birchfield-Tomasi data term.

In an exemplary embodiment of the disclosed apparatus, theBirchfield-Tomasi data term is defined based on the minimum disparity ofcoordinates for pixels in a first image of the scene captured by thefirst imaging device and a second image of the scene captured by thesecond imaging device.

In an exemplary embodiment of the disclosed apparatus, the smoothingterm adopts an energy function of differential of disparity.

In an exemplary embodiment of the disclosed apparatus, the smoothingterm is, for all neighbors of a pixel with the coordination of (x,y), asummation of scaled trigger function of a disparity between twoneighboring pixels.

In an exemplary embodiment of the disclosed apparatus, the neighbors arepixels from four domains.

In an exemplary embodiment of the disclosed apparatus, the smoothingterm is defined based on scaled trigger functions of a disparity for allneighbors of each pixel.

In an exemplary embodiment of the disclosed apparatus, the global energyfunction is optimized by

-   -   aggregating data terms in a plurality of directions to obtain a        directional energy function for each of the directions; and    -   accumulating the directional energy functions in the directions        to obtain the energy function.

In an exemplary embodiment of the disclosed apparatus, the directionscomprise a predetermined number of directions.

In an exemplary embodiment of the disclosed apparatus, the predeterminednumber of directions comprise four or eight directions.

In an exemplary embodiment of the disclosed apparatus, the energyfunction in one direction is based on a dynamic planning in thedirection.

In an exemplary embodiment of the disclosed apparatus, the energyfunction in one direction is obtained by summing a correspondingsmoothing term and the dynamic planning in this direction.

In an exemplary embodiment of the disclosed apparatus, the dynamicplanning in the direction is a recurrence based on the energy functionsof its neighbors in this direction.

In an exemplary embodiment of the disclosed apparatus, the directioncomprises a horizontal direction.

In an exemplary embodiment of the disclosed apparatus, the energyfunction in horizontal direction is obtained by a recurrence based onthe energy functions of its neighbors the horizontal direction.

In an exemplary embodiment of the disclosed apparatus, the best depth isobtained by seeking the disparity value that minimizes the summation ofenergy in a plurality of directions.

In an exemplary embodiment of the disclosed apparatus, the best depth isobtained based on an energy function in one direction.

In an exemplary embodiment of the disclosed apparatus, noise is reducedby matching the first and second scene images and/or identifyingrespective unique features of the first and second scene images whilesetting the disparity as −1.

In an exemplary embodiment of the disclosed apparatus, an error iscompensated based on the factors at least selected from a groupconsisting of the distance between the central lines of the two imagingdevices, actual distance of two adjacent pixels, the focal length of thetwo imaging devices and the depth between the object of interest and thefirst and second imaging devices.

In an exemplary embodiment of the disclosed apparatus, the depth map isoptimized by using a non-partial optimizing equation.

An exemplary embodiment of the disclosed apparatus comprises obtaining aJacobi iteration of the non-partial optimizing equation by a recurrencefiltering.

In an exemplary embodiment of the disclosed apparatus, the objectdetermination mechanism receives outside instructions to determine theobject of interest.

In an exemplary embodiment of the disclosed apparatus, the objectdetermination mechanism enables to identify the object of interestselected on either of the first and the second scene images.

In an exemplary embodiment of the disclosed apparatus, the objectdetermination mechanism enables to identify the object of interest by atleast one of sensing a frame on either of the first and the second sceneimages to frame in the object of interest and sensing a click on theobject of interest on either of the first and the second scene images.

In an exemplary embodiment of the disclosed apparatus, the objectdetermination mechanism receives outside vocal instructions, optionallya pre-set name of the object of interest, to determine the object ofinterest.

In an exemplary embodiment of the disclosed apparatus, the objectdetermination mechanism automatically determines the object of interestbased on a judgment under at least a pre-set rule.

In an exemplary embodiment of the disclosed apparatus, the pre-set rulecomprises if the object of interest is approaching the first and thesecond imaging devices or within a certain distance.

In an exemplary embodiment of the disclosed apparatus, the focal lengthassembly automatically adjusts the focal length of the imaging mechanismin real time with tracking learning detection based on gray levelinformation of the object of interest.

In accordance with another aspect disclosed herein, there is set forth amobile platform performing in accordance with any one of the abovemethods.

In accordance with another aspect disclosed herein, there is set forth amobile platform comprising any one of the above apparatus.

In accordance with another aspect disclosed herein, the above mobileplatform is an unmanned aerial vehicle (UAV).

In accordance with another aspect disclosed herein, the above mobileplatform is a self-stabilizing platform.

In accordance with another aspect disclosed herein, there is set forth amethod for obtaining a depth map of a scene, comprising:

-   -   capturing a plurality of scene images of the scene; and    -   calculating a disparity of the plurality of scene images.

In an exemplary embodiment of the disclosed methods, the capturing theplurality of scene images comprises capturing the plurality of sceneimages via first and second imaging devices.

In an exemplary embodiment of the disclosed methods, the calculating thedisparity comprises optimizing a global energy function.

In an exemplary embodiment of the disclosed methods, the optimizing theglobal energy function comprises summing a disparity energy function anda scaled smoothing term.

In an exemplary embodiment of the disclosed methods, the disparityenergy function is represented by a Birchfield-Tomasi term.

In an exemplary embodiment of the disclosed methods, wherein theBirchfield-Tomasi term is defined by accumulating a minimum disparity ofcoordinates for pixels in a first image of the scene captured by thefirst imaging device and a second image of the scene captured by thesecond imaging device.

An exemplary embodiment of the disclosed methods further comprises, forall neighbors of a selected pixel, accumulating scaled trigger functionsof a disparity between two neighboring pixels to the selected pixel toobtain the smoothing term.

In an exemplary embodiment of the disclosed methods, the accumulatingcomprises, for all neighbors of the selected pixel, accumulating scaledtrigger functions of a disparity between two neighboring pixels fromfour domains.

In an exemplary embodiment of the disclosed methods, the smoothing termis obtained by accumulating scaled trigger functions of a disparity forall neighbors of each pixel.

An exemplary embodiment of the disclosed methods further comprises:

-   -   aggregating data terms in a plurality of directions to obtain a        directional energy function for each of the directions; and    -   accumulating the directional energy functions in the directions        to obtain the energy function.

In an exemplary embodiment of the disclosed methods, the aggregatingcomprises obtaining energy functions in a predetermined number ofdirections.

In an exemplary embodiment of the disclosed methods, the aggregatingcomprises obtaining energy functions in four or eight directions.

In an exemplary embodiment of the disclosed methods, the aggregatingdata terms comprises obtaining the energy function in a selecteddirection by summing a corresponding smoothing term and a dynamicplanning in the selected direction.

In an exemplary embodiment of the disclosed methods, the summing thecorresponding smoothing term and the dynamic planning in the directioncomprises presenting the dynamic planning in the direction with arecurrence based on the energy functions of its neighbors in thedirection.

In an exemplary embodiment of the disclosed methods, the directioncomprises a horizontal direction.

In an exemplary embodiment of the disclosed methods, the aggregatingdata terms in the horizontal direction comprises calculating the energyby a recurrence based on the energy functions of its neighbors in thehorizontal direction.

An exemplary embodiment of the disclosed methods further comprisesobtaining a best depth.

In an exemplary embodiment of the disclosed methods, the obtaining thebest depth comprises seeking the disparity value that minimizes thesummation of energy in a plurality of directions.

In an exemplary embodiment of the disclosed methods, the obtaining thebest depth comprises seeking the disparity value based on an energyfunction in one direction.

An exemplary embodiment of the disclosed methods further comprisesreducing noise by doing at least one of matching scene images from thefirst and second imaging devices and identifying respective uniquefeatures of the scene images while setting the disparity as −1.

An exemplary embodiment of the disclosed methods further comprisescompensating an error based on factors at least selected from a groupconsisting of a distance between central lines of the two imagingdevices, an actual distance of two adjacent pixels, a focal length ofthe two imaging devices and a depth between the object of interest andthe first and second imaging devices.

An exemplary embodiment of the disclosed methods further comprisesoptimizing the depth map by using a non-partial optimizing equation.

An exemplary embodiment of the disclosed methods further comprisesobtaining a Jacobi iteration of the non-partial optimizing equation byusing a recurrence filtering.

In accordance with another aspect disclosed herein, there is set forthan apparatus for obtaining depth map of scene, comprising:

-   -   an imaging system for capturing a plurality of scene images; and    -   a depth assembly for calculating a disparity of the plurality of        scene images.

In an exemplary embodiment of the disclosed apparatus, the imagingsystem comprises first and second imaging devices.

In an exemplary embodiment of the disclosed apparatus, the depthassembly is configured to optimize a global energy function.

In an exemplary embodiment of the disclosed apparatus, the global energyfunction is defined as a sum of a disparity energy function and a scaledsmoothing term.

In an exemplary embodiment of the disclosed apparatus, the disparityenergy function comprises a Birchfield-Tomasi data term.

In an exemplary embodiment of the disclosed apparatus, theBirchfield-Tomasi data term is defined based on a minimum disparity ofcoordinates for pixels in a first image of the scene captured by thefirst imaging device and a second image of the scene captured by thesecond imaging device.

In an exemplary embodiment of the disclosed apparatus, the smoothingterm adopts an energy function of differential of disparity.

In an exemplary embodiment of the disclosed apparatus, the smoothingterm is, for all neighbors of a selected pixel with the coordination of(x,y), a summation of scaled trigger functions of a disparity betweentwo neighboring pixels.

In an exemplary embodiment of the disclosed apparatus, the neighbors arepixels from four domains.

In an exemplary embodiment of the disclosed apparatus, the smoothingterm is defined based on scaled trigger functions of a disparity for allneighbors of each pixel.

In an exemplary embodiment of the disclosed apparatus, the global energyfunction is optimized by

-   -   aggregating data terms in a plurality of directions to obtain an        energy function for each of the directions; and    -   accumulating the energy functions in the directions to obtain        the energy function.

In an exemplary embodiment of the disclosed apparatus, the directionscomprise a predetermined number of directions.

In an exemplary embodiment of the disclosed apparatus, the predeterminednumber of directions comprise four or eight directions.

In an exemplary embodiment of the disclosed apparatus, the energyfunction in one direction is based on a dynamic planning in thedirection.

In an exemplary embodiment of the disclosed apparatus, the energyfunction in one direction is obtained by summing a correspondingsmoothing term and the dynamic planning in the direction.

In an exemplary embodiment of the disclosed apparatus, the dynamicplanning in the direction is a recurrence based on the energy functionsof its neighbors in the direction.

In an exemplary embodiment of the disclosed apparatus, the directioncomprises a horizontal direction.

In an exemplary embodiment of the disclosed apparatus, the energyfunction in horizontal direction is obtained by a recurrence based onthe energy functions of its neighbors the horizontal direction.

In an exemplary embodiment of the disclosed apparatus, the depthassembly is configured to obtain a best depth by seeking the disparityvalue that minimizes the summation of energy in a plurality ofdirections.

In an exemplary embodiment of the disclosed apparatus, the best depth isobtained based on an energy function in one direction.

In an exemplary embodiment of the disclosed apparatus, the depthassembly is configured to reduce noise by matching the plurality ofimages and/or identifying respective unique features of the plurality ofimages while setting the disparity as −1.

In an exemplary embodiment of the disclosed apparatus, the depthassembly is configured to compensate an error based on factors at leastselected from a group consisting of a distance between central lines ofthe two imaging devices, an actual distance of two adjacent pixels, afocal length of the two imaging devices and a depth between the objectof interest and the first and second imaging devices.

In an exemplary embodiment of the disclosed apparatus, the depth map isoptimized by using a non-partial optimizing equation. In an exemplaryembodiment of the disclosed apparatus, the depth assembly is configuredto obtain a Jacobi iteration of the non-partial optimizing equation by arecurrence filtering.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an exemplary top-level block diagram illustrating anembodiment of a focal length adjustment apparatus and an imagingmechanism with a first imaging device and a second imaging device.

FIG. 2 and FIG. 3 show examples of two images of a scene obtained by thefirst imaging device and second imaging device of FIG. 1.

FIG. 4 is an exemplary block diagram showing an embodiment of the firstimaging device and the second imaging device.

FIG. 5 is a detail drawing showing an alternative embodiment of thefirst imaging device and the second imaging device of FIG. 1, in whichthe first imaging device and the second imaging device are installed onan unmanned aerial vehicle (UAV).

FIG. 6 schematically illustrates the process of computing a distancebetween an object of interest and the first and the second imagingdevices of FIG. 1 via triangulation.

FIG. 7 is an exemplary top-level block diagram illustrating anembodiment of a system for adjusting focal length, wherein the systemincludes a distance assembly.

FIG. 8 is an exemplary depth map obtained by the focal length adjustmentapparatus of FIG. 1.

FIG. 9 is an exemplary top-level flow chart illustrating an embodimentof a method for focal length adjustment.

FIG. 10 is an exemplary flow chart illustrating a process of determiningthe distance between the imaging mechanism and the object of interest.

FIGS. 11-13 are exemplary schematic charts for showing errors of thefocal length adjustment apparatus of FIG. 1 and the compensation effectfor the errors.

It should be noted that the figures are not drawn to scale and thatelements of similar structures or functions are generally represented bylike reference numerals for illustrative purposes throughout thefigures. It also should be noted that the figures are only intended tofacilitate the description of the embodiments. The figures do notillustrate every aspect of the described embodiments and do not limitthe scope of the present disclosure.

DETAILED DESCRIPTION OF THE EMBODIMENTS

Since currently-available focal length adjustment systems are incapableof providing an automatic focal length adjustment for imaging systems, afocal length adjustment apparatus and method are provided for adjustingfocal length automatically and serving as a basis for a wide range ofapplications, such as applications on unmanned aerial vehicles (UAVs)and other mobile platforms. This result can be achieved, according toone embodiment disclosed herein, by a focal length adjustment apparatus110 as illustrated in FIG. 1.

FIG. 1 depicts an illustrative embodiment of a focal length adjustmentapparatus 110. As shown in FIG. 1, the focal length adjustment apparatus110 can be coupled with an imaging mechanism 130. The imaging mechanism130 can generate one or more images of a scene 100 where an object 120of interest is positioned.

An example of images 199 of the scene 100 obtained by the imagingmechanism 130 are shown in FIG. 2 and FIG. 3. The generated images ofthe scene 100 can be processed by the focal length adjustment apparatus110 to generate a signal for adjusting a focal length of the imagingmechanism 130. The focal length of the imaging mechanism 130 can beadjusted in any suitable manner. In some embodiments, the focal lengthof the imaging mechanism 130 can be adjusted in real time.

Although shown and described with reference to FIG. 1 as comprising twoimaging devices 131, 132 for purposes of illustration only, the imagingmechanism 130 can comprise any suitable number of imaging devices 133.For example, the imaging mechanism 130 can have 2, 3, 4, 5, 6, or even agreater number of imaging devices. For an imaging mechanism 130 havingmore than two imaging devices, the automatic focal length adjustmentillustrated herein can be applied as to any pair of the imaging devices.

The imaging devices 131, 132 of FIG. 1 can be arranged in any desiredmanner in the imaging mechanism 130. The specific arrangement of theimaging devices 131, 132 can depend on a relevant imaging application.In some embodiments, for example, the imaging devices 131, 132 can bepositioned side-by-side so that the imaging devices 131, 132 haveparallel optical axes. In other embodiments, the imaging devices 131,132 can be positioned such that the optical axes of the imaging devices131, 132 are not parallel.

Each of the imaging devices 131, 132 can sense light and convert thesensed light into electronic signals that can be ultimately rendered asan image. Exemplary imaging devices 131, 132 suitable for use with thefocal length adjustment apparatus 110, include, but are not limited to,commercially-available cameras (color and/or monochrome) and camcorders.Suitable imaging devices 131, 132 can include analog imaging devices(for example, video camera tubes) and/or digital imaging devices (forexample, charge-coupled device (CCD), complementarymetal-oxide-semiconductor (CMOS), N-type metal-oxide-semiconductor(NMOS) imaging devices, and hybrids/variants thereof). Digital imagingdevices, for example, can include a two-dimensional array of photosensorelements (not shown) that can each capture one pixel of imageinformation. Either of the imaging devices 131, 132 can be, for example,an electro-optical sensor, a thermal/infrared sensor, a color ormonochrome sensor, a multi-spectral imaging sensor, a spectrophotometer,a spectrometer, a thermometer, and/or a illuminometer. Furthermore,either of the imaging devices 131, 132 can be, for example, anRed-Green-Blue (RGB) camera, an ultrasonic camera, a laser camera, aninfrared camera, an ultrasound camera or a Time-of-Flight camera.However, the imaging devices 131, 132 can be alternatively of the sametype. Similarly, the focal lengths of the imaging devices 131, 132 canbe the same and/or different without limitation to the scope of thepresent disclosure.

An exemplary first imaging device 131 and second imaging device 132 areshown in FIG. 4. A distance D between the first imaging device 131 andthe second imaging device 132 can be adjustable depending on an objectdistance Z (shown in FIG. 6) between the imaging devices 131, 132 andthe object 120 of interest. In an embodiment, the first imaging device131 and the second imaging device 132 can be installed on a portablecradle head 150. Once the object 120 of interest is determined, thefocal lengths of the first imaging device 131 and the second imagingdevice 132 can be adjusted automatically based on the object distance Z.By adjusting the focal lengths of the imaging devices 131, 132, theobject 120 of interest can be made clearly visible.

In some embodiments, the focal length adjustment apparatus 110 (shown inFIG. 1) can be physically located adjacent to the imaging mechanism 130(shown in FIG. 1), in which case data between the focal lengthadjustment apparatus 110 and the imaging mechanism 130 can becommunicated locally. An advantage of local communication is thattransmission delay can be reduced to facilitate real-time focal lengthadjustment, image processing, and parameter calibration. In otherembodiments, the focal length adjustment apparatus 110 can be locatedremotely from the imaging mechanism 130. Remote processing may beadopted, for example, because of weight restrictions or other reasonsrelating to an operational environment of the focal length adjustmentapparatus 110. As a non-limiting example, if the imaging mechanism 130is mounted aboard a mobile platform, such as an unmanned aerial vehicle(UAV) (shown in FIG. 5), it may be desirable to convey imaging data to aremote terminal (not shown) for centralized processing, such as a groundterminal or base station. Centralized processing may be desirable, forexample, where multiple UAVs are imaging a given object 120 of interestin a coordinated fashion. FIG. 4 illustrates an exemplary embodiment ofthe focal length adjustment apparatus 110, the imaging devices 131, 132are installed on a UAV 400.

Although shown and described as being the UAV 400 in FIG. 5 forexemplary purpose only, the mobile platform can be any kind of suchplatforms, including but not limited to any self-stabilizing mobileplatforms.

Various communication methods can be used for remote communicationbetween the imaging mechanism 130 and the focal length adjustmentapparatus 110. Suitable communication methods include, for example,radio, Wireless Fidelity (Wi-Fi), cellular, satellite, and broadcasting.Exemplary wireless communication technologies include, but are notlimited to, Global System for Mobile Communications (GSM), GeneralPacket Radio Service (GPRS), Code Division Multiple Access (CDMA),Wideband CDMA (W-CDMA), CDMA2000, IMT Single Carrier, Enhanced DataRates for GSM Evolution (EDGE), Long-Term Evolution (LTE), LTE Advanced,Time-Division LTE (TD-LTE), High Performance Radio Local Area Network(HiperLAN), High Performance Radio Wide Area Network (HiperWAN), HighPerformance Radio Metropolitan Area Network (HiperMAN), Local MultipointDistribution Service (LMDS), Worldwide Interoperability for MicrowaveAccess (WiMAX), ZigBee, Bluetooth, Flash Orthogonal Frequency-DivisionMultiplexing (Flash-OFDM), High Capacity Spatial Division MultipleAccess (HC-SDMA), iBurst, Universal Mobile Telecommunications System(UMTS), UMTS Time-Division Duplexing (UMTS-TDD), Evolved High SpeedPacket Access (HSPA+), Time Division Synchronous Code Division MultipleAccess (TD-SCDMA), Evolution-Data Optimized (EV-DO), Digital EnhancedCordless Telecommunications (DECT), and others.

Alternatively and/or additionally, the imaging mechanism 130 can be atleast partially incorporated into the focal length adjustment apparatus110. The imaging mechanism 130 thereby can advantageously serve as acomponent of the focal length adjustment apparatus 110.

As shown in FIG. 1, the imaging mechanism 130 can interface with thefocal length adjustment apparatus 110. For example, the imaging devices131, 132 of the imaging mechanism 130 can acquire respective images 199(shown in FIG. 2 and FIG. 3) of the scene 100 and relay the acquiredimages to the focal length adjustment apparatus 110 locally and/orremotely via a data communication system (not shown). The focal lengthadjustment apparatus 110 can be configured, for example, to reconstructa three-dimensional depiction of the object 120 of interest using thetwo images via stereopsis. The focal length adjustment apparatus 110thereby can determine whether a focal length adjustment would beadvantageous based on the object distance Z between the imagingmechanism 130 and the object 120 of interest and/or to conveycalibrating signals to the imaging mechanism 130 for a focal lengthadjustment. Additionally and/or alternatively, the focal lengthadjustment apparatus 110 can be advantageously configured toautomatically calibrate one or more extrinsic parameters forstereoscopic imaging.

Referring now to FIG. 6, the images 199 acquired by the imaging devices131, 132 can include images 199A and 199B. The images 199A (left,indicated as 1 in the following equations) and 199B (right, indicated asr in the following equations) can be compared to ascertain the objectdistance Z between the imaging devices 131, 132 and the object 120 ofinterest. A method of triangulation can be used to ascertain the objectdistance Z using a binocular disparity d between the two images 199A and199B. Specifically, coordinates (X_(i), Y_(i), Z_(i)) of a pixel i inthe image 199A(left) can be given as follows:

$\begin{matrix}{{X_{i} = {\frac{T}{d}( {x_{i}^{l} - c_{x}} )}},} & ( {{Equation}\mspace{14mu} 1} )\end{matrix}$

where c_(x) and c_(y) represent respective center coordinates of theimaging devices 131, 132, x_(i) and y_(i) represent the coordinates ofthe object 120 of interest in one or both of the images 199A (left) and199B (right), T is the baseline (in other words, the distance betweenthe center coordinates of the imaging devices 131, 132), and ƒ is arectified focal length of the imaging devices 131, 132, i is an indexover multiple objects 120 of interest and/or over multiple selectedpoints of an object 120 of interest that can be used to determine theobject distance Z, and d is the binocular disparity between the images199A(1) and 199B(r), represented here as:

d _(i)=x _(i) ^(l)−x _(i) ^(r)  (Equation 4)

The focal length adjustment apparatus 110 can include any processinghardware and/or software needed to perform image acquisition, focallength adjustment, calibration, and any other functions and operationsdescribed herein. Without limitation, the focal length adjustmentapparatus 110 can include one or more general purpose microprocessors(for example, single or multi-core processors), application-specificintegrated circuits, application-specific instruction-set processors,graphics processing units, physics processing units, digital signalprocessing units, coprocessors, network processing units, audioprocessing units, encryption processing units, and the like. In certainembodiments, the focal length adjustment apparatus 110 can include animage processing engine or media processing unit, which can includespecialized hardware for enhancing the speed and efficiency of imagecapture, filtering, and processing operations. Such operations include,for example, Bayer transformations, demosaicing operations, noisereduction operations, and/or image sharpening/softening operations.

In certain embodiments, the focal length adjustment apparatus 110 caninclude specialized hardware and/or software for performing focal lengthadjustment and parameter calibration. For example, specialized hardwareand/or software can be provided for functions including, but are notlimited to, reconstructing a three-dimensional depiction of the object120 of interest using the two-dimensional images via stereopsis,determining whether a focal length adjustment is needed based on adistance between the imaging mechanism 130 and the object 120 ofinterest, determining an optimal focal length, conveying control signalsto any components of the focal length adjustment apparatus 110 for focallength adjustment.

In some embodiments, the focal length adjustment apparatus 110 caninclude one or more additional hardware components (not shown), asdesired. Exemplary additional hardware components include, but are notlimited to, memories (for example, a random access memory (RAM), staticRAM, dynamic RAM, read-only memory (ROM), programmable ROM, erasableprogrammable ROM, electrically erasable programmable ROM, flash memory,secure digital (SD) card, etc.), and/or one or more input/outputinterfaces (for example, universal serial bus (USB), digital visualinterface (DVI), display port, serial ATA (SATA), IEEE 1394 interface(also known as FireWire), serial, video graphics array (VGA), supervideo graphics array (SVGA), small computer system interface (SCSI),high-definition multimedia interface (HDMI), audio ports, and/orproprietary input/output interfaces). Without limitation, one or moreinput/output devices (for example, buttons, a keyboard, keypad,trackball, displays, and a monitor) can also be included in the focallength adjustment apparatus 100, as desired.

In some embodiments, the image acquisition, focal length adjustment,calibration, and any other functions and operations described herein forthe focal length adjustment apparatus 110 can be achieved by softwarerunning on a conventional processor or a general purpose computer, suchas a personal computer. The software can be operated with suitablehardware discussed above as desired. The software, for example, can takeany form of source code, object code, executable code and machinereadable code. The source code can be written in any form of high-levelprogramming languages, including but not limited to, C++, Java, Pascal,Visual B and the like.

Turning now to FIG. 7, an exemplary block diagram illustrating analternative embodiment of the focal length adjustment apparatus 110 ofFIG. 1 is shown. The focal length adjustment apparatus 110 of FIG. 7comprises a distance assembly 701 for determining the object distance Z,that is, a distance between the object 120 of interest in the scene 100and the imaging mechanism 130. The focal length adjustment apparatus 110further includes a focal length assembly 702 configured to automaticallyadjust the focal length of the imaging mechanism 130 according to thedetermined distance by the distance assembly 701.

As seen from FIG. 7, the distance assembly 701 is shown as comprising adepth estimation mechanism 7011 for obtaining a depth map of the scene100, an object determination mechanism 7012 for determining the object120 of interest in the scene 100 and a calculating mechanism 7013 forcalculating distance of the object 120 of interest according to thedepth map of the scene 100 from depth estimation mechanism 7011.

In an embodiment of the present disclosure, the depth estimationmechanism 7011 receives a first image 199A (shown in FIG. 2) of thescene 100 from the first imaging device 131 and a second image 199B(shown in FIG. 3) of the scene 100 from the second imaging device 132.Based on the first image 199A and the second image 199B of the scene 100as shown in FIGS. 2 and 3, the depth estimation mechanism 7011 obtainstheir disparity, based on which a depth map 800 (shown in FIG. 8) isacquired. The specific operation of the depth estimation mechanism 7011for obtaining the depth map will be described in detail below withreference to FIG. 8.

An exemplary depth map 800 is depicted in FIG. 8. Each pixel (not shown)in the depth map 800 is associated with a value that represents adistance between the point corresponding to the pixel in the scene 100(shown in FIG. 1) and the imaging mechanism 130 (shown in FIG. 1). Forexample, in certain embodiments, a brightness value is utilized torepresent a distance between a point in the scene (either on the object120 of interest or not on the object 120 of interest) and an imagingdevice which images the scene. Alternatively and/or additionally,different color values can be assigned to pixels to represent thedistance. In a brightness value example, as seen from FIG. 8, a brighterarea 810 indicates points in the scene 100 with nearer distances to theimaging mechanism 130 (shown in FIG. 1), a darker area 820 indicatespoints in the scene 100 with further distances to the imaging mechanism130, and a grey area 830 indicates points in the scene 100 withdistances between the near and far distance. If the object 120 ofinterest moves in the scene 100, a brightness of the pixels for theobject 120 of interest can vary based upon the distance between theobject 120 of interest and the imaging mechanism 130. The selectedpixels for the object 120 of interest can become brighter when thedistance between the object 120 of interest and the imaging mechanism130 decreases and can become dimmer when the distance increases as shownin FIG. 8.

Returning to FIG. 7, the object determination mechanism 7012 can receiveoutside instructions to determine the object 120 (shown in FIG. 1) ofinterest by identifying the object 120 of interest selected in either ofthe first and the second images of the scene 100 (shown in FIG. 1).Outside instructions can be given by, for example, operators of thefocal length adjustment apparatus 110. The selection of the object 120of interest can be done by framing in the object 120 of interest ineither of the first and the second images of the scene 100. The firstand the second images of the scene 100 can be displayed on a displayscreen (not shown) and/or on a display screen (not shown) to theoperator of the focal length adjustment apparatus 110 for selection.Alternatively or additionally, the selection of the object 120 ofinterest can be performed by clicking the display screen(s) on theobject 120 of interest in either of the first and the second images ofthe scene 100 displayed on the display screen(s). The objectdetermination mechanism 7012 can sense the framing and/or the clickingoperation to identify the object 120 of interest being selected.

In another embodiment, the object determination mechanism 7012 enablesto receive outside oral instructions from, for example, an operator ofthe focal length adjustment apparatus 110. Optionally, the oralinstructions can be a pre-set name of the object 120 of interest.

Alternatively and/or additionally, the object determination mechanism7012 can be enabled to automatically determine the object 120 ofinterest based on a judgment under at least a pre-set rule. Any rule forthe judgment can be set as desired. For example, the pre-set rule maycomprise that the object 120 of interest is determined if the object 120of interest is approaching the first and the second imaging devices 131,132 and/or if the object 120 of interest is within a certain distancefrom the first and the second imaging devices 131, 132.

Based on the depth map from the depth estimation mechanism 7011 and theinformation about the object 120 of interest from the objectdetermination mechanism 7012, the calculating mechanism 7013 can beenabled to calculate the distance between the imaging mechanism 130 andthe object 120 of interest. In some embodiments, the calculatingmechanism 7013 calculates the distance in real time.

Based on the calculated distance, the focal length assembly 702 can beenabled to automatically adjust the focal length of the imagingmechanism 130, for example, in real time, with a tracking learningdetection method based on gray level information of the object 120 ofinterest serving as initial values.

If a user wants to focus the imaging mechanism 130 on a particularobject, for example, the object determination mechanism 7012 can enablethe user to draw a frame on the display showing the images of the scene100 to frame in the object 120 of interest to be tracked. The frame canbe in any suitable dimension, size or shape, including, but not limitedto, a shape of a rectangular, a square or a circle or even an irregularshape. Optionally, the user can be enabled to click the displayscreen(s) to confirm the selection. By using the depth map 800 (shown inFIG. 8) obtained by the depth estimation mechanism 7011, the calculatingmechanism 7013 can be enabled to calculate the distance of the objectbeing selected by the user and the focal length assembly 702 enables toadjust the focal length of the focal length adjustment apparatus 110automatically according to the object distance Z (shown in FIG. 6). Insome embodiments, the object distance Z is acquired in real time.

Referring now to FIG. 9, one embodiment of a method 900 for focal lengthadjustment is illustrated. At 901, an object distance between theimaging mechanism 130 (shown in FIG. 1) and the object 120 (shown inFIG. 1) of interest is determined. The object distance Z (shown in FIG.6) can be determined by using any of several various methods, asdesired. In some embodiments, the object distance Z can be determined byusing a plurality of imaging devices 133 (shown in FIG. 1) in theimaging mechanism 130 via stereopsis. For example, two imaging devices131, 132 (shown in FIG. 1) of the imaging mechanism 130 each can acquirean image (shown in FIGS. 2 and 3) of the object 120 of interest, andoverlapping portions of the acquired images can be analyzed to assessthe depth of the object 120 of interest. Alternatively and/oradditionally, the object distance can be acquired using one or morenon-stereopsis methods, such as by using a laser and/or usingultrasound. At 902, the focal length of the imaging mechanism 130 isautomatically adjusted according to the object distance Z determined instep 901.

A detailed process 901 of determining the distance between the imagingmechanism 130 and the object 120 of interest is illustrated in FIG. 10.At 9011, a depth map of the scene 100 is obtained. Any process ofobtaining a depth map may be applied here without limitation. Anexemplary process is illustrated in details below.

In an embodiment, a depth map is obtained by calculating a disparity ofscene images from the first and second imaging devices 131, 132. In thefield of obtaining a depth map, an energy function is usually computedover a subset of the whole image. In one embodiment, a global energyfunction is optimized to obtain a disparity global energy. Specifically,the disparity global energy can be calculated by summing a disparityenergy function and a scaled smoothing term.

An exemplary optimizing can be illustrated by the following equation:

E(d)=E _(d)(d)+pE _(s)(S)  (Equation 5)

wherein d indicates the disparity between the first and the secondimages of the scene, E_(d)(d) is a data term, which indicates thedisparity energy function and E_(s)(d) indicates a smoothing term.

The data term E_(d)(d) comprises a Birchfield-Tomasi data term which canbe obtained in accordance with the equation:

E _(d)(d)=Σ_(x,y) E _(d) _(BT−SAD) (x, y, d(x, y)=d)  (Equation 6)

wherein,

$\begin{matrix}{{E_{d_{BT}}( {x,y,{{d( {x,y} )} = d}} )} = {\min \mspace{11mu} \{ {C_{1},C_{2}} \}}} & ( {{Equation}\mspace{14mu} 7} ) \\{C_{1} = {\min\limits_{{x - d - 0.5} \leq x^{\prime} \leq {x - d + 0.5}}{{{I_{L}(x)} - {I_{R}( x^{\prime} )}}}}} & ( {{Equation}\mspace{14mu} 8} ) \\{C_{2} = {\min\limits_{{x - 0.5} \leq x^{\prime} \leq {x + 0.5}}{{{I_{L}( x^{\prime} )} - {I_{R}( {x - d} )}}}}} & ( {{Equation}\mspace{14mu} 9} ) \\{{E_{d_{{BT} - {SAD}}}( {x,y,{{d( {x,y} )} = d}} )} = {\sum{\min \mspace{11mu} \{ {C_{1},C_{2}} \}}}} & ( {{Equation}\mspace{14mu} 10} )\end{matrix}$

wherein, I_(L) represents the first image of the scene 100 captured bythe first imaging device 131 and I_(R) represents the second image ofthe scene 100 captured by the second imaging device 132 respectively; xin I_(L)(x) represents a horizontal coordinate of a pixel in the firstimage and x′ in I_(R)(x′) represents a horizontal coordinate of thepixel in the second image; (x, y) represents coordinates of a pixel inthe scene 100.

The introducing of the Birchfield-Tomasi data term which is a data termoften used in image sampling/matching resolves a problem of incorrectimage matching by utilizing a matching precision of sub pixels, which isa pixel dissimilarity measure that is insensitive to image sampling. Thecontents of IEEE Transactions on Pattern Analysis and MachineIntelligence (1998) explaining the Birchfield-Tomasi data term isincorporated here by reference. Other data terms may be adopted as thedata term in Equation 5 when calculating the disparity global energy.

The smoothing term E_(s)(d) can be presented by an energy function ofdifferential of disparity, which can be obtained by, for all neighborsof a pixel with the coordination of (x,y), summing scaled triggerfunctions of a disparity between two neighboring pixels. Specifically,the smoothing term E_(s)(d) can be obtained in accordance with theequation as follows:

E _(s)(∇d)=Σp ₁ T(|d(x, y)−d(x′, y′)|==1)+p ₂ T(|d(x,y)−d(x′,y′)|>1)  (Equation 11)

wherein, (x, y) represents the coordinates of the pixel of the firstimage and (x′, y′) represents the coordinates of the pixel of the secondimage; p₂ and p₁ are two adjustable weights, and usually p₂≥p₁ and thesummation sign is for all neighbors of (x,y) in which four domains areusually used, T is a trigger function which is triggered when theconditions in the parentheses are true.

In order to optimize the smoothing term E_(s)(d), a fastest dynamicplanning is utilized by aggregating data terms in a plurality ofdirections to obtain a directional energy function for each of thedirections and accumulating the directional energy functions in thedirections to obtain the energy function. In an embodiment, four oreight directions are selected to aggregate data terms. Other numbers ofdirections, such as three, five or ten directions, can be selectedwithout limitation.

In an embodiment, an energy function in one direction can be obtained bysumming its corresponding smoothing term and the dynamic planning inthis direction. A dynamic planning in one direction can be presented bya recurrence based on the energy functions of its neighbors in thisdirection. For example, energy functions of one pixel's neighbors in thehorizontal direction are presented as L(x−1, y, d), L(x−1, y, d+1),L(x−1, y, d−1) and L(x−1, y, d′). Specifically, an energy function inhorizontal direction is obtained in accordance with the equation:

$\begin{matrix}{{L( {x,y,d} )} = {{E_{s}( {x,y,d} )} + {\min \mspace{11mu} \{ {{L( {{x - 1},y,d} )},{{L( {{x - 1},y,{d - 1}} )} + p_{1}},{{L( {{x - 1},y,{d + 1}} )} + p_{1}},{{\min\limits_{d^{\prime}}\mspace{11mu} {L( {{p - 1},y,d^{\prime}} \}}} + p_{2}}} \}} - {\min\limits_{d^{\prime}}\mspace{11mu} {L( {{x - 1},y,d^{\prime}} )}}}} & ( {{Equation}\mspace{14mu} 12} )\end{matrix}$

wherein, the energy received at the coordinates (x, y, d) is defined asL(x, y, d), which is presented as a recurrence based on the energy ofneighbors L(x−1, y, d), L(x−1, y, d+1), L(x−1, y, d−1) and L(x−1, y,d′).

Furthermore, a best depth is obtained by seeking the disparity valuethat minimizes the summation of energy in a plurality of directions. Inan embodiment, the best depth is obtained in accordance with theequation:

d*=argmin_(d)ΣL(x, y, d)  (Equation 13)

wherein, d* indicates the best depth, L(x, y, d) indicates an energyfunction in one direction.

As an embodiment, some noise is reduced by matching the first and secondscene images and/or identifying respective unique features of the firstand second scene images while setting the disparity as −1.

The present disclosure also optimizes the depth map obtained by thedepth estimation mechanism 7011. As described above, the global energyfunction is optimized by compensating errors of the depth estimationmechanism 7011. Specifically, the errors can be compensated by followingequations:

$\begin{matrix}{\frac{b}{c_{1}} = {\frac{f}{f + D} = {{> c_{1}} = \frac{b( {f + D} )}{f}}}} & ( {{Equation}\mspace{14mu} 14} ) \\{\frac{c_{1}}{a} = {\frac{x}{x + D + f} = {{> x} = \frac{c_{1}( {D + f} )}{a - c_{1}}}}} & ( {{Equation}\mspace{14mu} 15} )\end{matrix}$

by combining Equations 14 and 15, we get:

$\begin{matrix}{{+ x} = {\frac{{b( {f + D} )}^{2}}{{af} - {b( {f + D} )}} \approx \frac{D^{2}}{{a \star \frac{f}{b}} - D}}} & ( {{Equation}\mspace{14mu} 16} )\end{matrix}$

on the other hand, we have:

$\begin{matrix}{\frac{b}{c_{2}} = {\frac{f}{f + D - x} = {{> c_{2}} = \frac{b( {f + D - x} )}{f}}}} & ( {{Equation}\mspace{14mu} 17} ) \\{\frac{c_{2}}{a} = {\frac{x}{D + f} = {{> x} = \frac{c_{2}( {D + f} )}{a}}}} & ( {{Equation}\mspace{14mu} 18} )\end{matrix}$

by combining Equations 17 and 18, we get:

$\begin{matrix}{{- x} = {\frac{{b( {f + D} )}^{2}}{{af} + {b( {f + D} )}} \approx \frac{D^{2}}{{a \star \frac{f}{b}} + D}}} & ( {{Equation}\mspace{14mu} 19} )\end{matrix}$

wherein

-   -   f=focal length (mm)    -   D=depth (depth between the object and the imaging plane, mm)    -   a=baseline (the distance between the central lines of the two        imaging devices, mm)    -   b=actual distance of two adjacent pixels (mm)    -   d=disparity (measured in pixel)

Therefore, the depth estimation errors are within a range of [−x, x],and the estimated depth is in a range of [D−x, D+x]. The above erroranalysis and the compensation are based on following assumptions. Forexample, the error estimation is based on an assumption that cameracalibration parameters are completely correct and an average error ofthe disparity map is a theoretical value for 1 pixel. An actualcalibration can introduce errors, and a depth estimation error mayexceed 1 pixel. Therefore, the above data of errors reflects a trendonly. As a reference: an average error of a depth estimation in fourtesting maps by using the first ranked stereo matching process,Middelbury, is 1.29+0.14+6.47+5.70/4=3.4 (pixel). Even under a situationwith no noise, no alteration of light beams and calibration parametersare correct, and the current best process presents an average error forall points as 3.4 pixels.

A few schematic charts showing the errors of the focal length adjustmentapparatus 110 and the compensation effect for the errors are depicted inFIGS. 11-13.

FIG. 11 illustrates an exemplary relationship between the estimatederror of the focal length adjustment apparatus 100 and the measureddepth D of the scene 100 when only the measured depth D varies. In otherwords, the baseline a, the distance of two adjacent pixels b and thefocal length f of the equations 16, 17 remain constant as the measureddepth D changes. As shown in FIG. 11, a horizontal axis represents ameasured depth D in mm, and a vertical axis represents the estimatederror of the focal length adjustment apparatus 110. As FIG. 11 shows,when the measured depth D changes, the estimated error of the focallength adjustment apparatus 100 also changes in a non-linearrelationship with reference to the measured depth D. For example, whenthe measured depth D is 3000 mm, the estimated error according to FIG.11 is about 5 mm, but when the measured depth D increases to 6000 mm,the corresponding estimated error increases to over 20 mm.

Although shown and described as being the non-linear relationship inFIG. 11 for exemplary purposes only, the relationship of the estimatederror and the measured depth D can be any linear and/or non-linearrelationships.

FIG. 12 illustrates an exemplary relationship between the estimatederror of the focal length adjustment apparatus 100 and the baseline whenonly the baseline a varies. Here, the measured depth D, the distance oftwo adjacent pixels b and the focal length f of the equations 16, 17remain constant when the baseline a changes. In FIG. 12, a horizontalaxis represents the baseline a in mm, and a vertical axis represents theestimated error in mm. As FIG. 12 shows, when the baseline a increases,the estimated error can decrease by in accordance with a non-linearrelationship between the estimated error and the baseline a. Forexample, when the baseline a is 500 mm, the estimated error according toFIG. 12 can be as high as 35 mm, but when the baseline a increases to2000 mm, the estimated error decreases to about 8 mm.

Although shown and described as being the non-linear relationship inFIG. 12 for exemplary purposes only, the relationship of the estimatederror and the baseline can be any linear and/or non-linearrelationships.

FIG. 13 is an illustrative example for showing representativecorresponding relationships among the image representation symbols andthe variables contained in the Equations 14-19. In other words,Equations 14-19 can be deducted from the represented relationshipsillustrated in FIG. 13. Here, the imaging devices 131, 132 (shown inFIG. 1) are represented with two cameras, Cam1 and Cam2, which have abaseline “a” and a focus length “f”.

As shown in FIG. 13, triangles ABO₂ and CBO₂ are similar because AB isparallel to CD. Therefore, we can get Equation 14:

$\frac{b}{c_{1}} = {\frac{f}{f + D} = {{> c_{1}} = {\frac{b( {f + D} )}{f}.}}}$

In addition, triangles ADE and O₁O₂E are similar because CD is parallelto O₁O₂. Therefore, we can get Equation 15:

$\frac{c_{1}}{a} = {\frac{x}{x + D + f} = {{> x} = {\frac{c_{1}( {D + f} )}{a - c_{1}}.}}}$

By combining Equations 14 and 15, we can reach Equation 16:

${+ x} = {\frac{{b( {f + D} )}^{2}}{{af} - {b( {f + D} )}} \approx {\frac{D^{2}}{{a \star \frac{f}{b}} - D}.}}$

For same reasons, Equation 19 can be deducted from a combination of asimilarity relationship of triangles AHO₂ and a similarity relationshipof FGO₂ and O₁O₂C. In both Equations 16 and 19, D is the actual depthbetween the scene 100 and the imaging plane, a is the baseline betweenthe two image devices 131, 132, b is the distance of two adjacent pixelsand f is the focal length of the imaging devises 131, 132, as shown inFIG. 13.

Although shown and described as being the same absolute value for −x and+x in FIG. 13 for exemplary purposes only, the estimated errors −x and+x can be different absolute values as results of equations 16 and 19,in which case, b can carry different values in each equation.

Based on the characteristics of the estimated errors shown in FIGS. 11and 12, the depth map can be further optimized by applying a non-partialoptimizing equation whose Jacobi iteration can be obtained by arecurrence filtering. Specifically, the non-partial optimizing equationis defined in accordance with the following equation:

E(d)=Σ|d(x, y)−d*(x, y)|²+Σexp(|I _(L)(x, y)−I _(L)(x′,y′)|+|x′−x|+y′−y|)|d(x, y)−d(x′y′)|  (Equation 20)

in which, d* (x, y) indicates the optimal depth map and d(x,y) is theestimated depth map; I(x,y) represents the intensity of the image; x,yare the coordinates of the pixel in an image coordinate; x′,y′ are thecoordinates of an adjacent pixel of x,y in the same image.

Similarly to the operations performed by the object determinationmechanism 7012 and the calculating mechanism 7013, in step 9012 of FIG.10, the object 120 of interest is determined and the distance betweenthe object 120 of interest and the imaging mechanism 130 is calculatedin step 9013 of FIG. 10, which in turn serves as a basis forautomatically adjusting the focal length of the imaging mechanism 130.

A stereoscopic imaging system configured to conduct the aforementionedoperations to perform automatic focal length adjustment can be obtainedaccording to any embodiment of the present disclosure.

Furthermore, a computer program product comprising instructions forautomatically adjusting focal length of a stereoscopic imaging systemhaving at least two imaging devices in accordance with theaforementioned operations can be obtained according to an embodiment ofthe present disclosure. In some embodiments, the method forautomatically adjusting focal length according to the present disclosurecan be achieved by an ordinary computing device, such as a personalcomputer and/or a microcomputer.

The disclosed embodiments are susceptible to various modifications andalternative forms, and specific examples thereof have been shown by wayof example in the drawings and are herein described in detail. It shouldbe understood, however, that the disclosed embodiments are not to belimited to the particular forms or methods disclosed, but to thecontrary, the disclosed embodiments are to cover all modifications,equivalents, and alternatives.

What is claimed is:
 1. A method for obtaining a depth map of a scene,comprising: capturing a plurality of scene images of the scene;calculating a disparity of corresponding pixels in the scene images byoptimizing a global energy function, wherein the global energy functionis optimized based on dynamic planning in a plurality of directions; andobtaining the depth map based on the disparity, wherein the depth map isused to control a mobile platform.
 2. The method of claim 1, whereincapturing the plurality of scene images comprises capturing theplurality of scene images via a first imaging device and a secondimaging device of an imaging mechanism.
 3. The method of claim 2,wherein the imaging mechanism comprises multiple imaging devicesincluding the first imaging device and the second imaging device.
 4. Themethod of claim 2, wherein the method further comprises: determining adistance of an object of interest in the scene according to the depthmap of the scene; adjusting a distance between the first imaging deviceand the second imaging device based on the distance of the object ofinterest in the scene.
 5. The method of claim 4, wherein the distance ofthe object of interest is a distance between the object of interest inthe scene and the imaging mechanism.
 6. The method of claim 4, whereinthe object of interest is selected by sensing a frame on any of thescene images framing in the object of interest or sensing a click on theobject of interest on any of the scene images.
 7. The method of claim 4,wherein the object of interest is approaching or within a certaindistance of the imaging mechanism.
 8. The method of claim 1, whereinoptimizing the global energy function comprises summing a disparityenergy function and a scaled smoothing term.
 9. The method of claim 8,wherein optimizing the global energy function further comprisesobtaining the disparity energy function by accumulating a minimumdisparity of coordinates for pixels in the scene images.
 10. The methodof claim 8, wherein optimizing the global energy function furthercomprises, for all neighbors of a pixel, accumulating scaled triggerfunctions of a disparity between two neighboring pixels to obtain thescaled smoothing term.
 11. The method of claim 1, wherein optimizing theglobal energy function comprises: aggregating data terms in theplurality of directions to obtain a directional energy function for eachof the directions; and accumulating the directional energy functions inthe directions.
 12. The method of claim 11, wherein aggregating the dataterms comprises presenting the dynamic planning of a pixel in the one ofthe directions with a recurrence based on the directional energyfunctions of neighbors of the pixel in the one of the directions. 13.The method of claim 11, wherein aggregating the data terms comprisesobtaining the directional energy functions in a predetermined number ofdirections.
 14. The method of claim 11, wherein aggregating the dataterms comprises calculating an energy by the recurrence based on thedirectional energy functions of the neighbors of the pixel in ahorizontal direction.
 15. The method of claim 1, wherein calculating thedisparity of corresponding pixels in the scene images comprisescalculating the disparity of corresponding pixels of an object ofinterest in the scene images.
 16. The method of claim 1, wherein themethod further comprises optimizing the depth map by using a non-partialoptimizing equation.
 17. The method of claim 16, wherein the methodfurther comprises obtaining a Jacobi iteration of the non-partialoptimizing equation by using a recurrence filtering.
 18. The method ofclaim 1, wherein the method further comprises reducing noise byperforming at least one of matching the scene images or identifyingrespective unique features of the scene images.
 19. An apparatus forobtaining depth map of scene, comprising: an imaging mechanism forcapturing a plurality of scene images; and a depth assembly configuredto: calculate a disparity of corresponding pixels in the scene images byoptimizing a global energy function, wherein the global energy functionis optimized based on dynamic planning in a plurality of directions; andobtain the depth map based on the disparity, wherein the depth map isused to control a mobile platform.
 20. The apparatus of claim 19,wherein the imaging mechanism comprises a first imaging device and asecond imaging device.